Optimal Sequential Exploration: Bandits, Clairvoyants, and Wildcats (Online Appendix)
نویسندگان
چکیده
منابع مشابه
Optimal Sequential Exploration: Bandits, Clairvoyants, and Wildcats
This paper was motivated by the problem of developing an optimal strategy for exploring a large oil and gas field in the North Sea. Where should we drill first? Where do we drill next? The problem resembles a classical multiarmed bandit problem, but probabilistic dependence plays a key role: outcomes at drilled sites reveal information about neighboring targets. Good exploration strategies will...
متن کاملCorrelational Dueling Bandits with Application to Clinical Treatment in Large Decision Spaces
We consider sequential decision making under uncertainty, where the goal is to optimize over a large decision space using noisy comparative feedback. This problem can be formulated as a Karmed Dueling Bandits problem where K is the total number of decisions. When K is very large, existing dueling bandits algorithms suffer huge cumulative regret before converging on the optimal arm. This paper s...
متن کاملExploration-Free Policies in Dynamic Pricing and Online Decision-Making
Growing availability of data has enabled practitioners to tailor decisions at the individuallevel. This involves learning a model of decision outcomes conditional on individual-specific covariates or features. Recently, contextual bandits have been introduced as a framework to study these online and sequential decision making problems. This literature predominantly focuses on algorithms that ba...
متن کاملExponentiated Gradient LINUCB for Contextual Multi-Armed Bandits
We present Exponentiated Gradient LINUCB, an algorithm for contextual multi-armed bandits. This algorithm uses Exponentiated Gradient to find the optimal exploration of the LINUCB. Within a deliberately designed offline simulation framework we conduct evaluations with real online event log data. The experimental results demonstrate that our algorithm outperforms surveyed algorithms.
متن کاملAn Optimal Algorithm for Linear Bandits
We provide the first algorithm for online bandit linear optimization whose regret after T rounds is of order √ Td lnN on any finite class X ⊆ R of N actions, and of order d √ T (up to log factors) when X is infinite. These bounds are not improvable in general. The basic idea utilizes tools from convex geometry to construct what is essentially an optimal exploration basis. We also present an app...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013